Below is a deep review focused on how senior engineers actually think about state machines and workflow modeling in .NET systems.

State Machines and Workflow Modeling in .NET Systems

State is one of those topics that looks simple when the system is small and becomes one of the biggest sources of bugs when the system becomes real.

At first, people model behavior with booleans:

IsRunning
IsStopped
HasError
IsPaused
IsCompleted

Then six months later they discover impossible combinations like:

IsRunning = true and IsStopped = true
IsCompleted = true but CurrentStep = Preparing
UI says “Start” is available while hardware is already busy

That is exactly why state modeling matters. It is really about making illegal situations unrepresentable, or at least much harder to create.

PART 1 — CORE CONCEPTS RECAP

Finite state machine (FSM)

A finite state machine is a model where a system can be in one of a finite set of states, and it can move between those states only through defined transitions.

This is the key idea:

at any moment, the system has a current state
something happens, usually an event
based on the current state and event, the system may transition to another state
if the transition is not allowed, it should be rejected

Example:

A machine controller might have:

Idle
Preparing
Running
Paused
Completed
Error

And events like:

StartRequested
PreparationSucceeded
PauseRequested
ResumeRequested
StopRequested
FaultDetected

The machine should not go from Idle directly to Paused unless you explicitly define that transition.

That is the real value: behavior becomes explicit, not accidental.

States, transitions, events

State

A state represents the current mode or phase of the system.

Examples:

machine state: Idle, Running, Error
workflow state: Created, Approved, Rejected
device connection state: Disconnected, Connecting, Connected

A good state answers: “What is the system currently allowed to do?”

Event

An event is something that happens and may cause a transition.

Examples:

user clicks Start
PLC sends Ready signal
timeout occurs
validation fails
external API responds

An event is not the same as a state. A common mistake is mixing them.

Bad thinking:

“Start” is a state

Correct thinking:

“Start button clicked” is an event
“Running” is a state

Transition

A transition is the defined movement from one state to another in response to an event, often subject to conditions.

Example:

Idle + StartRequested -> Preparing
Preparing + PreparationSucceeded -> Running
Running + FaultDetected -> Error

That is the core of state machine design.

Deterministic vs non-deterministic systems

Deterministic

A deterministic state machine means:

Given:

current state
event
relevant inputs/conditions

the next state is uniquely determined.

Example:

In Running, if event is StopRequested, always go to Stopping

This is what most production systems should aim for.

Non-deterministic

A non-deterministic system means the same state and event may lead to multiple possible next states.

In academic theory this is normal. In production application design, it usually means one of two things:

you have hidden inputs that are not modeled
your design is incomplete or ambiguous

Example: If Running + InspectionFinished sometimes goes to Completed and sometimes to Error, then the real model is probably:

Running + InspectionFinished + ResultValid = true -> Completed
Running + InspectionFinished + ResultValid = false -> Error

So the system was not actually non-deterministic. The model was under-specified.

Senior engineers generally push workflow logic toward determinism because deterministic systems are easier to test, reason about, reproduce, and recover.

PART 2 — STATE REPRESENTATION

There are several ways to represent state in .NET. The right one depends on system complexity.

Enum-based state

Example:

csharp

public enum InspectionState
{
    Idle,
    Preparing,
    Running,
    Paused,
    Completed,
    Error
}

This is the most common representation.

Pros

simple
easy to serialize
easy to persist in DB
easy to inspect in logs
cheap and fast
good for most workflows

Cons

behavior tends to spread across switch statements
rules become duplicated across services, UI, handlers
easy to create “God switch” logic
hard to model state-specific behavior cleanly as complexity grows

Example of typical enum usage:

csharp

switch (currentState)
{
    case InspectionState.Idle:
        if (command == Start) currentState = InspectionState.Preparing;
        break;
    case InspectionState.Preparing:
        if (signal == Ready) currentState = InspectionState.Running;
        break;
}

This is fine for modest systems.

Object-based state

Instead of storing only enum State, you represent each state as an object.

csharp

public interface IInspectionState
{
    IInspectionState Handle(InspectionEvent evt, InspectionContext context);
}

Then concrete states:

csharp

public sealed class IdleState : IInspectionState
{
    public IInspectionState Handle(InspectionEvent evt, InspectionContext context)
    {
        return evt switch
        {
            StartRequested => new PreparingState(),
            _ => this
        };
    }
}

Pros

behavior is localized per state
avoids giant switch blocks
easier to attach state-specific rules
good when each state has distinct behavior, entry/exit actions, validation

Cons

more classes
harder to persist directly
can become over-engineered
allocations and indirection may add complexity without enough benefit
some teams struggle to navigate it

This model is strongest when behavior differs heavily by state, not just state names.

Pros/cons summary

Enum-based

Best when:

number of states is moderate
transitions are straightforward
persistence and reporting matter
team wants simple operational visibility

Object-based

Best when:

behavior is rich and state-specific
you need entry/exit logic
workflow logic is getting tangled
different states expose different capabilities

When to use the State Pattern (OO)

Use the classic State Pattern when:

state-specific behavior is large enough that switch statements are becoming unreadable
each state has different allowed operations
you want polymorphism instead of condition-heavy code
transitions have rich rules and side effects

Do not use it just because it is a famous pattern.

If your workflow has 5 states and 8 clear transitions, an enum plus transition table is often better than 20 classes.

Senior engineers avoid pattern worship. They pick the simplest model that still preserves correctness.

PART 3 — TRANSITION MODELING

This is where many systems succeed or fail.

The main question is not “How do I store state?”

It is: “How do I define what transitions are legal?”

Explicit transition tables

A transition table makes allowed transitions visible in one place.

Example:

csharp

public enum InspectionState { Idle, Preparing, Running, Paused, Completed, Error }
public enum InspectionTrigger { Start, Prepared, Pause, Resume, Complete, Fail, Reset }

public static class InspectionTransitions
{
    public static readonly Dictionary<(InspectionState, InspectionTrigger), InspectionState> Map = new()
    {
        { (InspectionState.Idle, InspectionTrigger.Start), InspectionState.Preparing },
        { (InspectionState.Preparing, InspectionTrigger.Prepared), InspectionState.Running },
        { (InspectionState.Running, InspectionTrigger.Pause), InspectionState.Paused },
        { (InspectionState.Paused, InspectionTrigger.Resume), InspectionState.Running },
        { (InspectionState.Running, InspectionTrigger.Complete), InspectionState.Completed },
        { (InspectionState.Running, InspectionTrigger.Fail), InspectionState.Error },
        { (InspectionState.Error, InspectionTrigger.Reset), InspectionState.Idle }
    };
}

Why this is powerful

legality is explicit
invalid transitions are easy to reject
testing becomes simple
reviewers can inspect the workflow without reading the whole codebase

This is often better than burying transition logic inside many handlers.

Guard conditions

A transition may be structurally valid but conditionally forbidden.

Example:

Idle -> Preparing is allowed only if machine is connected and recipe is loaded

That is a guard.

csharp

if (state == InspectionState.Idle &&
    trigger == InspectionTrigger.Start &&
    machine.IsConnected &&
    recipe != null)
{
    state = InspectionState.Preparing;
}

Better is to make the guard part of the transition definition conceptually:

from Idle
on Start
only if MachineConnected && RecipeLoaded
transition to Preparing

Guards should answer: “Under what condition is this transition legal?”

They should not contain unrelated side effects.

Bad guard:

validates machine state
sends hardware command
updates UI
writes DB row

That is not a guard anymore. That is business logic soup.

Enforcing invariants

An invariant is something that must always be true.

Examples:

there can be only one active inspection at a time
Completed inspections cannot accept new frames
Running requires an active recipe and live machine session
Paused implies previous state was Running

Invariants matter more than transitions alone.

You can have a legal transition graph and still violate system correctness if state data is inconsistent.

Example: State says Running, but CurrentJobId is null. That is a broken invariant.

So a good transition function should validate both:

is this transition allowed?
will the resulting state still satisfy invariants?

That is senior-level thinking.

PART 4 — EVENT-DRIVEN STATE MACHINES

Real systems are not driven only by function calls. They are driven by events:

user actions
machine signals
timers
callbacks
external messages
async completions

This is where clean diagrams become messy reality.

Handling external events (machine signals)

Suppose hardware sends:

Ready
CycleStarted
InspectionDone
FaultRaised

These are asynchronous and may arrive:

late
duplicated
out of order
on background threads

So the state machine must not assume events are always clean.

Example: If InspectionDone arrives while the system is still Preparing, you have a few possibilities:

ignore it
log and reject it
move to Error
buffer until relevant

Which one is correct depends on the domain. But it must be explicit.

A robust state machine treats external events as untrusted input.

Users also generate events:

Start clicked
Stop clicked
Retry clicked
Reset clicked

These events may conflict with machine events.

Example:

operator clicks Stop
machine simultaneously reports Complete

Now what is the final state?

Possible outcomes:

Stopping
Completed
Error
Cancelled

If the transition ordering is not designed, you get race-condition bugs that reproduce once a month in production and take weeks to diagnose.

Ordering and concurrency issues

This is the real-world problem:

Events are not just “what happened.” They are “what happened, in what order, under what concurrency model.”

Questions you must answer:

Are events processed one at a time?
Is ordering guaranteed?
Can two threads transition state concurrently?
Is event handling reentrant?
Can one transition trigger another event synchronously?

A very common production approach is:

all workflow events go through a single serialized event-processing loop
transitions are processed one at a time
state changes become atomic at the workflow level

This dramatically simplifies reasoning.

In .NET, this can be implemented with:

Channel<T>
ActionBlock
dedicated event loop task
mailbox pattern
actor-like model

That is often safer than letting many threads mutate workflow state directly.

PART 5 — CONCURRENCY & STATE

This is where state bugs become nasty.

A workflow may be logically simple but still fail because state mutation is not thread-safe.

Race conditions in state transitions

A race condition happens when correctness depends on timing between threads.

Example:

Two threads both observe:

current state = Idle

Thread A handles StartRequested Thread B handles ResetRequested

If both read the same old state and both write new states independently, final state depends on timing.

Possible bad outcomes:

lost transition
duplicate commands to hardware
invalid side effects executed twice

This is why “read current state, decide, write current state” is dangerous unless protected.

Ensuring thread-safe state changes

There are a few common models.

1. Lock-based synchronization

csharp

private readonly object _sync = new();

public void Handle(Event evt)
{
    lock (_sync)
    {
        Transition(evt);
    }
}

Good:

simple
effective
easy to reason about for one workflow instance

Bad:

can deadlock if external code is called inside lock
hurts scalability if too coarse
dangerous if async code is mixed incorrectly

Golden rule: Do not await inside a normal lock. Do not call external components while holding state lock if they can reenter.

2. Single-threaded event processing

All events are queued and processed by one logical worker.

Good:

avoids most race conditions
preserves order
easier mental model
great for workflow engines and device controllers

Bad:

you must design backpressure and queue growth
long handlers block later events
side effects must still be controlled carefully

For stateful workflows, this is often the cleanest approach.

3. Atomic compare-and-swap style

Useful when state is a small immutable value.

Conceptually:

read old state
compute new state
update only if old state is still unchanged

In .NET this is often based on Interlocked.CompareExchange.

Good:

high performance
no coarse lock

Bad:

difficult once transitions involve multiple fields or side effects
easy to get wrong
not ideal for rich workflows

This is more common in low-level concurrent infrastructure than business workflows.

Atomic transitions

An atomic transition means the system never exposes a half-transitioned state.

Bad example:

update DB
update in-memory state
send machine command
update UI

If step 3 fails, what state is the system really in?

You need a transaction boundary, even if not a database transaction.

In workflow systems, atomicity usually means:

validate transition
produce new state + intended effects
commit state change
execute side effects in controlled order
if side effect fails, move to compensating/error path explicitly

A useful design is to separate:

decision: “what transition should happen?”
effect: “what external actions should be performed because of that?”

That makes transitions more testable and predictable.

PART 6 — STATE PERSISTENCE

If your system crashes, can it recover correctly?

That is the real test of workflow design.

Persisting state for recovery

For long-running workflows, state usually must survive:

process crash
machine restart
OS reboot
deployment restart
power loss

At minimum you usually persist:

workflow instance ID
current state
important state data
version / concurrency token
timestamp
last processed event or sequence number

Example persisted row:

WorkflowId
CurrentState = Running
RecipeId = RX-101
CurrentLotId = LOT-5
StepIndex = 4
Version = 17

Persistence should support answering: “What was the last known durable state?”

Restoring workflows after crash/restart

Recovery is not just loading the last enum from DB.

You must also decide:

what external operations may already have happened?
what in-flight event was partially processed?
is the machine still running?
should workflow resume, reconcile, or fail safe?

A real recovery strategy often includes a reconciliation phase:

load persisted workflow state
query external reality
- machine actual state
- files present
- pending commands
- sensor statuses
compare expected vs actual
choose recovery transition

Example: Persisted state says Running, but machine says Idle.

That means one of these:

workflow state is stale
machine restarted independently
inspection ended unexpectedly
communication was lost

You should not blindly resume. You need recovery logic.

Persistence patterns

Snapshot persistence

Store the latest full state.

Good:

simple
easy recovery

Bad:

limited audit trail
harder to understand how you got there

Event sourcing

Store all events and rebuild state by replay.

Good:

full history
strong auditability
good for debugging and business traceability

Bad:

more complexity
replay cost
schema/versioning complexity
harder operational model

For most industrial or operational workflows, a hybrid is common:

persist current snapshot
also log state transition history

That gives both fast recovery and decent auditability.

PART 7 — ERROR STATES & RECOVERY

Many teams model happy path carefully and treat errors as “special cases.” That is a mistake.

In real systems, failure is part of the workflow.

Modeling failure states

Error should not just be an exception. Often it should be a state.

Examples:

MachineError
ValidationFailed
CommunicationLost
RecoveryRequired
PausedForOperator
RetryPending

Why model failure as state?

Because once failure happens, the system changes behavior:

UI options change
retries become available
certain operations are blocked
manual intervention may be required
recovery flow becomes explicit

This is much better than “catch exception and show message.”

Recovery transitions

A good workflow explicitly defines how to leave error states.

Examples:

CommunicationLost + ReconnectSucceeded -> Idle
ValidationFailed + CorrectRecipe -> Ready
MachineError + ResetAcknowledged -> Idle
RetryPending + RetryRequested -> Preparing

Recovery should not be magical. Operators and support engineers need to know what path exists.

Retry vs fail-fast design

This is a domain decision.

Retry

Use retry when failure is transient:

network timeout
device temporarily busy
file lock
short communication hiccup

But retries need boundaries:

max attempts
backoff
timeout
escalation to error state

Blind retries can hide real faults and make recovery harder.

Fail-fast

Use fail-fast when continuing is dangerous or corrupting:

inconsistent machine position
recipe mismatch
safety interlock triggered
invalid calibration
duplicate workflow ID
invariant broken

In industrial or safety-adjacent systems, fail-fast is often the safer choice.

A senior engineer asks: “Is the cost of false progress worse than the cost of stopping?”

Very often, yes.

PART 8 — PERFORMANCE & COMPLEXITY

State machines are conceptually clean, but large systems can become huge.

State explosion problem

State explosion happens when you try to encode too many dimensions into one flat state enum.

Example: A workflow depends on:

machine mode
connection status
inspection phase
user authorization
safety state

If you flatten everything, you get monstrosities like:

RunningConnectedAuthorizedSafe
RunningDisconnectedAuthorizedSafe
PausedConnectedUnauthorizedSafe

That is not maintainable.

This usually means you are mixing multiple independent dimensions into one machine.

Managing complexity in large workflows

1. Separate orthogonal concerns

Do not put everything into one state machine.

Examples:

machine connection state
inspection workflow state
UI interaction state
authorization state

These are related, but they are not necessarily the same machine.

2. Use hierarchical state modeling

Instead of one giant flat model:

Operational
- Idle
- Preparing
- Running
- Paused
Faulted
- RecoverableFault
- FatalFault

This reduces duplication.

3. Use sub-workflows

A large workflow often contains smaller workflows:

job loading
calibration
inspection execution
result export

Each can have its own state machine.

4. Keep transition rules close to the model

If transition logic is scattered across:

UI
service layer
hardware callbacks
background jobs
DB triggers

you no longer really have a state machine. You have a state rumor.

5. Favor explicitness over cleverness

A workflow engine nobody can read is worse than a boring explicit one.

PART 9 — COMMON LOW-LEVEL PITFALLS

These are the bugs that repeatedly show up in production systems.

Implicit transitions

This is when state changes happen as side effects in random places.

Example:

machine callback directly sets state to Running
timeout handler directly sets state to Error
UI handler directly sets state to Idle

Now no one knows the full transition graph.

This destroys reasoning and observability.

Rule: There should be one authoritative path for state transitions.

Duplicated logic

Example:

UI checks whether Start is allowed
application service checks again
hardware coordinator checks again
workflow object checks a different version

Now behavior diverges.

The UI says Start is enabled. The backend rejects it. Logs say “invalid state.” Operator gets confused.

Rule: The workflow model should be the source of truth. UI should derive from it, not invent rules separately.

Inconsistent state sources

This is a major real-world problem.

You may have:

in-memory current state
DB persisted state
machine-reported state
UI displayed state

If they disagree, what is authoritative?

Example:

DB says Paused
machine says Running
UI says Stopping

That is not a coding bug anymore. That is an operational incident.

Senior systems define:

authoritative state
observed state
derived state

For example:

authoritative workflow state = application workflow engine
observed machine state = hardware feedback
derived UI state = projection of workflow + machine + permissions

That separation helps a lot.

Hidden transition side effects

A transition is not just a state change. It often triggers:

command dispatch
notifications
persistence
audit log
UI updates
metrics

If those are mixed directly inside transition code without structure, testing becomes hard and recovery becomes fragile.

A better pattern is:

transition decision returns new state + effects
effect executor performs side effects
failures are fed back as explicit events

That is much closer to robust workflow design.

PART 10 — SENIOR ENGINEER MENTAL MODEL

This is the most important part.

Senior engineers do not think about state machines as diagrams first. They think about correctness.

How to reason about system correctness via state

A useful mindset is:

1. What states exist?

Not just names, but meanings.

For each state, ask:

what does this state mean operationally?
what is allowed?
what must be true?

Example: Running means:

active job exists
machine session established
input stream accepted
stop/pause allowed
start not allowed

That is much better than “Running is when it runs.”

2. What events can happen?

Include all real inputs:

user actions
external signals
timeouts
failures
retries
cancellations

3. What transitions are legal?

For each state + event:

next state?
or reject?
with what reason?

4. What invariants must always hold?

This is where correctness lives.

5. What side effects happen on transition?

And what if they fail?

6. What happens after restart?

If you cannot answer this, the model is incomplete.

How to design safe workflows

A safe workflow design usually has these properties:

Explicit authority

One place decides state transitions.

Serialized mutation

Avoid concurrent mutation of workflow state unless you have a very strong reason.

Durable checkpoints

Persist enough state to recover.

Explicit failures

Failure paths are modeled, not improvised.

Observable transitions

Every transition should be loggable and inspectable.

A transition log should ideally include:

workflow ID
old state
event
new state
reason / guard result
correlation ID
timestamp

That turns debugging from archaeology into engineering.

When debugging a production issue, think in this order:

1. What was the last known good state?

Find the last valid transition.

2. What event was processed next?

Was it expected, duplicated, out of order, or stale?

3. Did the transition violate invariants?

If yes, bug is probably in transition logic or recovery logic.

4. Did concurrent handlers race?

Look for:

overlapping commands
multiple event sources
duplicate callbacks
unsynchronized writes

5. Did state and side effects diverge?

Example: state changed to Completed, but file export failed. Now what does “Completed” even mean?

6. Did persistence lag reality?

Maybe in-memory state moved but DB did not, or vice versa.

This is why transition history and correlation IDs matter so much.

Practical .NET Design Guidance

If you are implementing this in .NET, a strong default approach for many business and industrial workflows is:

represent state with enum or immutable record
centralize transitions in a workflow engine/service
use explicit transition methods or transition table
serialize event processing per workflow instance
persist durable state after accepted transitions
log every transition
model error/recovery states explicitly

A practical shape might be:

csharp

public sealed record WorkflowState(
    WorkflowStatus Status,
    string? JobId,
    int Version,
    string? ErrorCode);

public interface IWorkflowEvent { }

public sealed class WorkflowEngine
{
    private readonly object _sync = new();

    public WorkflowState Current { get; private set; }

    public WorkflowEngine(WorkflowState initialState)
    {
        Current = initialState;
    }

    public TransitionResult Handle(IWorkflowEvent evt)
    {
        lock (_sync)
        {
            var result = Decide(Current, evt);

            if (!result.Accepted)
                return result;

            ValidateInvariants(result.NewState);
            Current = result.NewState;
            return result;
        }
    }

    private TransitionResult Decide(WorkflowState state, IWorkflowEvent evt)
    {
        // Explicit transition logic here
        throw new NotImplementedException();
    }

    private void ValidateInvariants(WorkflowState state)
    {
        // Invariant checks here
    }
}

Then keep side effects outside the pure decision logic as much as possible.

That gives you:

testable transition logic
explicit correctness checks
safer concurrency
clearer recovery design

Final Interview-Level Takeaway

The most mature answer in an interview is not:

“Use a state machine library.”

It is:

“A workflow is safe only when the system has explicit states, explicit transitions, enforced invariants, controlled concurrency, durable recovery, and observable failure paths. The real design challenge is not representing the current state as an enum. The real challenge is making transitions authoritative, deterministic, thread-safe, recoverable, and operationally debuggable.”

That is the difference between code that works in a demo and systems that survive production.

If you want, next I can turn this into:

a .NET implementation guide with code structure, or
an interview Q&A version with senior-level answers.

Streaming Pipelines Dotnet Real World

State Machines and Workflow Modeling in .NET Systems ​

PART 1 — CORE CONCEPTS RECAP ​

Finite state machine (FSM) ​

States, transitions, events ​

State ​

Event ​

Transition ​

Deterministic vs non-deterministic systems ​

Deterministic ​

Non-deterministic ​

PART 2 — STATE REPRESENTATION ​

Enum-based state ​

Pros ​

Cons ​

Object-based state ​

Pros ​

Cons ​

Pros/cons summary ​

Enum-based ​

Object-based ​

When to use the State Pattern (OO) ​

PART 3 — TRANSITION MODELING ​

Explicit transition tables ​

Why this is powerful ​

Guard conditions ​

Enforcing invariants ​

PART 4 — EVENT-DRIVEN STATE MACHINES ​

Handling external events (machine signals) ​

Handling user actions ​

Ordering and concurrency issues ​

PART 5 — CONCURRENCY & STATE ​

Race conditions in state transitions ​

Ensuring thread-safe state changes ​

1. Lock-based synchronization ​

2. Single-threaded event processing ​

3. Atomic compare-and-swap style ​

Atomic transitions ​

PART 6 — STATE PERSISTENCE ​

Persisting state for recovery ​

Restoring workflows after crash/restart ​

Persistence patterns ​

Snapshot persistence ​

Event sourcing ​

PART 7 — ERROR STATES & RECOVERY ​

Modeling failure states ​

Recovery transitions ​

Retry vs fail-fast design ​

Retry ​

Fail-fast ​

PART 8 — PERFORMANCE & COMPLEXITY ​

State explosion problem ​

Managing complexity in large workflows ​

1. Separate orthogonal concerns ​

2. Use hierarchical state modeling ​

3. Use sub-workflows ​

4. Keep transition rules close to the model ​

5. Favor explicitness over cleverness ​

PART 9 — COMMON LOW-LEVEL PITFALLS ​

Implicit transitions ​

Duplicated logic ​

Inconsistent state sources ​

Hidden transition side effects ​

PART 10 — SENIOR ENGINEER MENTAL MODEL ​

How to reason about system correctness via state ​

1. What states exist? ​

2. What events can happen? ​

3. What transitions are legal? ​

4. What invariants must always hold? ​

5. What side effects happen on transition? ​

6. What happens after restart? ​

How to design safe workflows ​

Explicit authority ​

Serialized mutation ​

Durable checkpoints ​

Explicit failures ​

Observable transitions ​

How to debug state-related production issues ​

1. What was the last known good state? ​

2. What event was processed next? ​

State Machines and Workflow Modeling in .NET Systems

PART 1 — CORE CONCEPTS RECAP

Finite state machine (FSM)

States, transitions, events

State

Event

Transition

Deterministic vs non-deterministic systems

Deterministic

Non-deterministic

PART 2 — STATE REPRESENTATION

Enum-based state

Pros

Cons

Object-based state

Pros

Cons

Pros/cons summary

Enum-based

Object-based

When to use the State Pattern (OO)

PART 3 — TRANSITION MODELING

Explicit transition tables

Why this is powerful

Guard conditions

Enforcing invariants

PART 4 — EVENT-DRIVEN STATE MACHINES

Handling external events (machine signals)

Handling user actions

Ordering and concurrency issues

PART 5 — CONCURRENCY & STATE

Race conditions in state transitions

Ensuring thread-safe state changes

1. Lock-based synchronization

2. Single-threaded event processing

3. Atomic compare-and-swap style

Atomic transitions

PART 6 — STATE PERSISTENCE

Persisting state for recovery

Restoring workflows after crash/restart

Persistence patterns

Snapshot persistence

Event sourcing

PART 7 — ERROR STATES & RECOVERY

Modeling failure states

Recovery transitions

Retry vs fail-fast design

Retry

Fail-fast

PART 8 — PERFORMANCE & COMPLEXITY

State explosion problem

Managing complexity in large workflows

1. Separate orthogonal concerns

2. Use hierarchical state modeling

3. Use sub-workflows

4. Keep transition rules close to the model

5. Favor explicitness over cleverness

PART 9 — COMMON LOW-LEVEL PITFALLS

Implicit transitions

Duplicated logic

Inconsistent state sources

Hidden transition side effects

PART 10 — SENIOR ENGINEER MENTAL MODEL

How to reason about system correctness via state

1. What states exist?

2. What events can happen?

3. What transitions are legal?

4. What invariants must always hold?

5. What side effects happen on transition?

6. What happens after restart?

How to design safe workflows

Explicit authority

Serialized mutation

Durable checkpoints

Explicit failures

Observable transitions

How to debug state-related production issues

1. What was the last known good state?

2. What event was processed next?

3. Did the transition violate invariants?